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Abstract 

A  fast  simulated  annealing  algorithm  is  developed  for  automatic  object  recognition.  The  object  recognition 
problem  is  addressed  as  the  problem  of  best  describing  a  match  between  a  hypothesized  object  and  an 
image.  The  normalized  correlation  coefficient  is  used  as  a  measure  of  the  match.  Templates  are  generated 
on-line  during  the  search  by  transforming  model  images.  Simulated  annealing  reduces  the  search  time  by 
orders  of  magnitude  with  respect  to  an  exhaustive  search.  The  algorithm  is  applied  to  the  problem  of 
how  landmarks,  for  example,  traffic  signs,  can  be  recognized  by  an  autonomous  vehicle  or  a  navigating 
robot.  Images  are  assumed  to  be  taken  while  the  robot  or  the  vehicle  is  moving  through  its  environment. 
It  tries  to  match  them  with  templates  created  online  from  models  stored  in  a  database.  We  illustrate 
the  performance  of  our  algorithm  with  real-world  images  of  complicated  scenes  with  traffic  signs.  False 
positive  matches  occur  only  for  templates  with  very  small  information  content.  To  avoid  false  positive 
matches,  we  propose  a  method  to  select  model  images  for  robust  object  recognition  by  measuring  the 
information  content  of  the  model  images.  The  algorithm  works  well  in  noisy  images  for  model  images  with 
high  information  content. 
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1  The  Recognition  Problem 

An  object  in  an  image  /  is  defined  to  be  recognized  if 
it  correlates  highly  with  a  template  image  T  of  the  hy¬ 
pothesized  object.  This  template  image  T  is  a  trans¬ 
formed  version  of  the  model  of  the  hypothesized  object. 
Model  images  of  objects  are  stored  in  a  library.  Section  2 
shows  how  to  compute  the  template  from  the  model.  A 
template  T(x,y),  for  0  <  x  <  nr,  0  <  y  <  mr,  is  gen¬ 
erally  much  smaller  than  the  image  I(x ,y).  The  tem¬ 
plate  is  compared  with  the  part  /r(x,  y)  of  image  I(x ,  y) 
that  contains  the  hypothesized  object.  Assuming  pixel 
(x0)yo\  is  at  the  lower-left  corner  of  the  hypothesized 
object  in  /,  subimage  It  is  defined  to  be 

h(x,y)  =  /(xo  +  ^yo  +  y)  f°r  0  <  x  <  nT,  0  <  y  <  my. 

We  use  the  normalized  correlation  coefficient  as  a  mea¬ 
sure  of  how  well  images  It  and  T  correlate  or  match.  For 
images  It  and  T,  the  normalized  correlation  coefficient 
p  is  the  covariance  of  It  and  T  normalized  by  the  stan¬ 
dard  deviation  of  It  and  T.  The  correlation  coefficient 
is  dimensionless,  and  \p\  <  1.  The  correlation  coefficient 
measures  how  accurate  image  It  can  be  approximated 
by  template  T.  Image  It  and  template  T  are  perfectly 
correlated  if  p  =  1.  We  approximate  p  using  the  sampled 
coefficient  of  correlation 

r  =  (PT  Er,y  h(x,  y)T(x ,  j /)  -  M* ,  J/))  • 

(E 

where  <r/T  =  \Jpr'52x,y  M£>p)2  “  (E*,y  2/))  . 

fr  =  ^PT  Er,y  T(x-  p)2  “  (Ex,y  T(;C>  S'))  and  PT  is 

the  number  of  pixels  in  the  template  image  T  with 
nonzero  brightness  values  and  pt  <  nr  •  Note  this 
last  condition  means  that  not  all  the  pixels  in  images  T 
and  It  are  actually  compared  but  only  the  nonzero  pix¬ 
els  in  T  with  the  corresponding  pixels  in  It-  This  is 
important,  for  example,  if  the  template  contains  a  cir¬ 
cular  object.  Here  pixels  in  T  bordering  the  circle  (or 
the  background)  will  be  zero  (black).  The  computa¬ 
tion  time  of  r  is  proportional  to  the  number  of  pixels  in 
the  hypothesized  object,  which  is  usually  much  smaller 
than  the  number  of  pixels  in  /.  Using  the  correlation 
as  a  measure  of  successful  recognition  is  also  advanta¬ 
geous  because  it  is  a  very  robust  measure.  That  is,  it 
is  relatively  insensitive  to  fluctuations  in  the  environ¬ 
ment  compared  to  higher  resolution  methods,  as  is  well 
documented  in  spectral,  bearing,  and  range  estimation 
problems  [Joh82,  BKM93]. 

2  Generating  Templates  from  Model 
Images 

A  template  T{x)y)  is  generated  from  a  model  im¬ 
age  M(x,y)  by  choosing  three  parameters  that  describe 
a  transformation  from  M  into  T.  The  parameters  deter¬ 
mine  how  the  model  is  sampled,  and  if  necessary,  how  it 
is  interpolated  to  generate  the  template.  The  parame¬ 
ters  used  are  a  rotation  parameter  r  and  two  sampling 
parameters  sx  and  sy. 


For  notational  convenience,  we  define  the  origin  ot  a 
coordinate  system  for  model  image  \I(x,y)  to  be  in  the 
middle  of  the  image,  i.e. ,  M(xty)  is  defined  for  -(u.w  - 
l)/2  <  x  <  (tim  ~  l)/2  and  -(%  -  L)/2  <  y  <  (m.u  - 
L)/2  for  TiM,mM  odd.  Then  the  rotation  parameter  r 
determines  how  the  x  and  y  axes  of  M(x,  y)  are  rotated 
to  define  the  x  and  y  axes  of  T(x,y).  More  precisely, 
given  vectors 


mx 


and 


which  lie  on  the  coordinate  axes  of  M,  and  model  radius 
Rm  —  \J (r*M2~1)2  +  comPute  vectors 


tx  =  Rm  (cos  r,  sin  r)  and  ty  =  Rm  (-  sin  r,  cos  r) 


which  define  the  coordinate  axes  of  the  template  image  T 
in  continuous  space.  The  axes  of  T  always  span  the 
model  object  as  show  in  Figure  1. 

The  sampling  parameters  sx  and  sy  determine  how 
many  samples  along  vectors  tx  and  ty  are  used  for  the 
template  image,  respectively.  The  spacing  between  the 
samples  along  tx  is  ((nM  -  l)/2 )/sx.  If  there  is  a  pixel 
in  M(x,y)  after  every  (um  -  l)/(2sx)  step  along  tx,  its 
brightness  is  used  to  define  T  along  its  x-axis.  For  exam¬ 
ple  this  scenario  may  occur  if  r  =  45  degrees,  and  sx  = 
{nM  — 1)/2.  As  shown  in  Figure  1,  if  sx  -  (um  -  l)/4  the 
model  is  down-sampled  and  transformed  into  a  template 
that  is  about  one-quarter  the  size  of  the  model.  Pixels 
of  zero  brightness  are  added  where  necessary  as  shown 
in  Figure  1. 

In  general,  there  may  not  be  a  pixel  in  M  at  the  sam¬ 
pling  point  on  vector  tx.  If  this  is  the  case,  we  use  a 
four-point  interpolation  to  define  the  brightness  for  the 
template  at  that  point.  Similarly,  M  is  sampled  (and  if 
necessary  interpolated)  along  vectors  fcy,—tx,  and  — ty 
to  obtain  the  brightness  of  the  template  pixels  along  the 
template  coordinate  axes.  The  rest  of  the  template  is 
now  determined  from  M  along  the  grid  that  is  defined 
by  the  samples  on  the  template  coordinate  axes. 

Since  the  sampling  rates  sx  and  sy  in  the  template 
coordinate  system  are  different  in  general,  the  template 
is  a  rotated,  scaled,  and  uniformly  deformed  version  of 
the  model.  More  parameters  would  be  needed  to  de¬ 
scribe  more  general  non-uniform  and  non-linear  deforma¬ 
tions  of  the  model.  A  straightforward  extension  would 
be  to  add  a  fourth  parameter  to  obtain  a  non-uniform 
linear  deformation  of  the  model.  However,  for  our  pur¬ 
poses,  the  transformation  described  is  sufficient  because 
the  objects  to  be  recognized  are  usually  flat,  normal  to 
the  viewing  direction  and  far  away  from  the  camera  com¬ 
pared  to  the  object  size.  Our  method  computes  the  tem¬ 
plate  very  quickly  by  sweeping  over  the  model  image  only 
once.  The  time  for  creating  a  tit  x  tut  template  image 
is  O(nTmr). 

Examples  of  a  model  and  corresponding  transformed 
templates  are  shown  in  Figure  2.  The  first  two  templates 
are  scaled  by  sx  =  sy  and  are  not  rotated.  The  remain¬ 
ing  templates  in  Figure  2  are  defined  by  more  general 
transformations  with  sx  ^  sy . 
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Figure  1:  A  5  x  5  template  image  is  obtained  from  a  9  x  9  model  image  using  parameters  sx  =  sy  =  2  and  r  =  45 
degrees. 


Figure  2:  Model  of  slow  sign  with  101  x  111  pixels,  and  six  templates  of  slow  sign.  Templates  are  obtained  by 
sampling  model  sign  at  various  sampling  rates  and  degrees  of  rotation. 


3  The  Parameter  Search  Space 

The  space  of  possible  solutions  of  the  recognition  prob¬ 
lem  is  extremely  large,  even  if  a  particular  object  is 
known  to  be  in  the  image  a  priori.  The  dimension  of 
the  search  space  is  determined  by  the  number  of  possi¬ 
bilities  for  position,  size,  shape,  and  orientation  of  the 
object.  The  number  of  possibilities  for  the  position  of 
the  centroid  of  the  object  in  the  image  is  0(n2)  for  a 
n  x  n  image.  Assuming  that  the  size  and  shape  of  the 
object  can  be  approximated  by  sampling  the  model  along 
two  perpendicular  axes  as  described  in  the  previous  sec¬ 
tion,  the  number  of  possibilities  to  approximate  the  size 
and  shape  of  the  object  is  also  0(n2).  Even  with  this 
assumption,  the  number  of  possible  angles  is  still  very 
large;  since  the  image  is  discrete,  we  assume  that  the 
number  of  possible  angles  is  O(n).  Thus,  the  size  of  the 
search  space  is  0(n5)  for  an  n  x  n  image.  For  a  typi¬ 
cal  image  of  size  256  x  256,  the  search  space  has  a  size 
of  order  1014.  An  exhaustive  search  of  this  space  would 
take  too  long  to  find  a  good  match  between  templates 
and  images. 

We  use  terminology  from  the  radar  and  sonar  liter¬ 
ature  to  describe  the  search  space.  We  call  the  space 
an  ambiguity  surface.  A  peak  in  the*  ambiguity  surface 
means  that  the  correlation  coefficient  is  high  for  a  par¬ 
ticular  set  of  parameters.  Figure  3  shows  an  example  of 
a  two-dimensional  ambiguity  surface  with  a  peak  shown 
in  black.  There  may  be  several  peaks  in  an  ambiguity 
surface.  If  the  template  and  the  object  in  the  image 
match  perfectly,  the  cross-correlation  between  template 
and  image  results  in  a  peak  in  the  ambiguity  surface 


which  is  the  global  optimum.  Due  to  noise  and  reduction 
of  the  search  space  by  our  template  transformation,  we 
do  not  expect  a  perfect  match.  However,  in  most  cases 
the  global  optimum  corresponds  to  a  correct  match  or 
recognition. 

As  we  can  also  see  in  Figure  3,  an  iterative  search  for 
a  peak  in  the  ambiguity  surface  such  as  steepest  descent 
would  fail  because  it  would  get  “stuck”  in  local  minima. 
Simulated  annealing,  however,  is  able  to  “jump”  out  of 
local  minima  and  find  the  globally  best  correlation  value. 


4  The  Simulated  Annealing  Algorithm 

In  this  section  we  describe  our  algorithm  for  finding  an 
optimal  match  between  images  and  templates.  Our  al¬ 
gorithm  is  based  on  a  fast  version  of  simulated  anneal¬ 
ing.  Simulated  annealing  has  become  a  popular  search 
technique  for  solving  optimization  problems.  Its  name 
originates  from  the  process  of  slowly  cooling  molecules 
to  form  a  perfect  crystal.  The  cooling  process  and  its 
analogous  search  algorithm  is  an  iterative  process,  con¬ 
trolled  by  a  decreasing  temperature  parameter.  At  each 
iteration,  our  algorithm  generates  templates  on-line  as 
described  in  Section  2.  New  test  values  for  the  loca¬ 
tion,  sampling,  and  rotation  parameters  of  the  template 
are  randomly  perturbed  from  current  values.  If  the  cor- 
relation  coefficient  rj  increases  over  the  previous  coeffi-  ““ 
cient  r;- _ i ,  the  new  parameter  values  are  accepted  in  the 
j-th  iteration  (as  in  the  gradient  method).  Otherwise,  p 
they  are  accepted  if 


iom 


/tr 
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Figure  3:  On  the  left,  image  Slow3.  On  the  right,  the  ambiguity  surface  of  image  Slow3  computed  for  all  possible 
translations  given  fixed  angle  and  scaling  parameters.  A  deterministic  search  would  compute  each  value  on  this 
surface.  A  steepest  descent  procedure  would  fail  because  of  local  minima.  Therefore,  a  stochastic  search  is  used  to 
find  the  best  correlation  value  (here  the  darkest  pixel  value). 


where  £  is  randomly  chosen  to  be  in  [0,  1],  7}  is  the  tem¬ 
perature  parameter,  and  Ej  =  1  -  rj  is  the  cost  function 
in  the  j-th  iteration.  For  a  sufficient  temperature  this 
allows  ''jumps”  out  of  local  minima.  We  choose 


Tj  =  T0/j  1  <j<L 


as  the  cooling  schedule  for  the  j-th  update  of  the  temper¬ 
ature  parameter  where  To  is  the  initial  temperature  and 
L  is  the  number  of  iterations  during  the  search.  Note 
that  the  rate  at  which  the  temperature  decreases  is  in¬ 
verse  linear  as  first  proposed  by  Szu  and  Hartley  [SH87] 
and  converges  faster  than  an  often  used  logarithmically 
inverse  cooling  schedule  [GG84].  As  a  criteria  for  stop¬ 
ping  the  annealing  process,  we  simply  put  a  limit  on  the 
search  length  L.  Although  this  does  not  ensure  conver¬ 
gence  to  the  optimal  correlation  coefficient,  the  solutions 
we  obtain  for  the  parameters  are  generally  sufficient  and 
solve  the  recognition  task. 

As  Kuperman  et  al.  [KCPD90]  point  out,  if  the  search 
problem  involves  different  kinds  of  parameters  the  an¬ 
nealing  algorithm  is  rather  analogous  to  the  cooling  of  a 
mixture  of  liquids,  each  of  which  have  different  freezing 
points.  An  algorithm  that  randomly  perturbs  all  param¬ 
eters  at  the  same  time  has  poor  convergence  properties. 
Therefore,  at  a  specific  temperature  we  do  not  combine 
the  test  for  the  choice  of  the  location,  sampling,  and 
rotation  angle.  We  also  obtain  good  results  using  simu¬ 
lated  annealing  only  for  the  location  parameters,  and  a 
gradient  descent  procedure  [CBK+93]  for  the  remaining 
parameters  given  large  enough  perturbations. 

To  properly  deal  with  image  boundaries  of  an  image 
I{x,y)  for  which  0  <  x  <  nj  and  0  <  y  <  m/,  we  use  the 
following  formula  to  perturb  the  z-coordinate  cx  of  the 
centroid  position  of  a  template  with  radius  Rt  in  image 


{cx  if  cx  —  Rt  >  0  and  cx  +  Rt  <  n/ 

— c*  if  cx  +  Rt  <  0  and  cx  -  Rt  >  — n/ 

2 nr  -  cx  if  cx  -  Rt  >  n/  and  cx  -f  Rt  <  2 n/ 
nij 2  otherwise  (unlikely  perturbation). 


The  {/-coordinate  cy  of  the  centroid  of  the  template  is 


perturbed  similarly.  This  formula  avoids  attracting  the 
centroid  position  to  the  rim  or  corners  of  the  image. 

5  Experimental  Results 

The  algorithm  described  above  was  implemented  on  a 
Sun  workstation  and  on  a  Silicon  Graphics  Iris.  We  used 
the  model  images  shown  in  Figure  4  to  find  templates 
that  correlate  optimally  with  the  scene  images  shown  in 
Figure  5.  The  images  are  quantized  using  256  grey  levels. 
The  size  of  the  model  images  is  122  x  1 17  pixels  (except 
for  the  one-way  sign,  which  has  178  x  60  pixels.)  The  size 
of  the  scene  images  varies  between  100  x  70  and  516  x  365 
pixels. 

For  all  scene  images,  the  shape,  size,  orientation,  and 
location  of  any  traffic  sign  is  found  if  it  is  known  a  priori 
what  kind  of  sign  to  look  for.  For  example,  using  the 
stop  sign  model  shown  in  Figure  4  the  algorithm  finds 
the  stop  sign  in  a  complicated  scene  image  like  image 
Stop5.  (This  is  the  second  image  in  the  last  row  of  images 
in  Figure  5;  see  also  Figure  6).  The  stop  sign  in  scene 
image  Stop5  is  recognized  although  the  stop  sign  model 
was  constructed  from  a  picture  of  a  completely  different 
stop  sign.  Note  that  the  stop  sign  in  image  Stop5  has 
graffiti,  while  the  model  sign  does  not. 

For  the  more  general  problem  of  recognizing  which 
object  is  in  a  scene  image  (i.e.,  not  knowing  the  kind 
of  traffic  sign  a  priori),  we  ran  144  experiments  with  18 
scene  images  and  8  model  images.  Table  1  contains  the 
correlation  values  obtained  in  the  experiments.  For  each 
scene  image,  our  algorithm  computes  the  highest  corre¬ 
lation  coefficient  among  the  set  of  values  obtained  for 
each  model  (boldface  values  in  Table  1).  The  model  cor¬ 
responding  to  the  maximum  correlation  value  is  selected 
as  the  sign  recognized  in  the  scene  image.  For  most  scene 
images,  the  correlation  coefficient  is  highest  if  a  match 
between  a  sign  in  the  image  and  its  corresponding  tem¬ 
plate  occurs.  Only  for  three  images,  Slow2,  Stop4,  and 
Stop5,  a  false  positive  match  occurs  because  the  best 
correlation  coefficient  is  not  the  one  for  the  correspond¬ 
ing  model.  We  show  the  templates  causing  these  false 


Figure  4:  Model  images  used  in  experiments:  Footpath,  El-no-entry,  No-entry,  One-way,  Priority,  Slow,  Stop,  and 
Yield. 


positive  matches  in  Figure  6. 

There  are  two  facts  that  contribute  to  the  false  pos¬ 
itive  matches.  First,  some  models  do  not  have  enough 
structure  by  themselves  and  match  easily  with  arbitrary 
parts  of  the  images.  For  example,  the  European  no-entry 
sign’s  white  middle  bar  matches  with  the  roof  of  a  car  in 
image  Stop5,  as  shown  in  Image  5  of  Figure  6.  In  Sec¬ 
tion  6  we  analyze  this  problem  quantitatively.  Second, 
some  models  look  quite  different  from  the  actual  land¬ 
mark  in  the  scene  image.  For  example,  as  mentioned 
before,  the  stop  sign  model  does  not  have  any  graffiti 
while  the  signs  in  Stop4  and  Stop5  do.  The  templates 
constructed  from  the  model  stop  sign  do  not  match  the 
stop  signs  in  images  Stop4  and  Stop5  well  enough  to 
result  in  a  correlation  coefficient  larger  than  the  one  ob¬ 
tained  with  the  model  E-no-entry  (see  Image  4  and  5  of 
Figure  6).  One  could  try  to  solve  this  problem  by  mak¬ 
ing  a  model  of  each  traffic  sign  (including  its  graffiti)  in 
the  environment.  However,  this  would  result  in  a  huge 
library  of  signs  which  would  increase  the  search  time  sub¬ 
stantially.  Moreover,  the  environment  may  change  and 
outdate  the  library  quickly.  Therefore,  we  instead  pro¬ 
pose  to  select  a  small  number  of  model  images  with  high 
information  content  (see  Section  6)  so  that  false  positive 
matches  are  avoided. 


drastically  demonstrates  the  advantage  of  simulated  an¬ 
nealing.  We  used  image  Noentry2  which  has  112  x  77 
pixels.  The  search  space  had  about  6.8  x  107  sets  of  pa¬ 
rameters.  It  took  15  seconds  to  recognize  the  sign  using 
our  simulated  annealing  algorithm.  In  contrast,  exhaus¬ 
tive  search  found  the  sign  after  more  than  10  hours  of 
computation  time. 

Figure  7  illustrates  how  fast  our  simulated  annealing 
algorithm  recognizes  a  sign  in  a  scene  image. 
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5.1  Illumination  Changes 

The  correlation  coefficient  p(/j,T)  measures  not  only 
how  accurate  image  It  can  be  approximated  by  template 
T,  but  also  how  accurate  image  It  can  be  approximated 
by  a  linear  function  of  T,  since  p(/j ,  T )  =  p(It  ,  aT  +  6) 
for  some  constants  a,  6.  Therefore,  the  correlation  coeffi¬ 
cient  is  invariant  to  constant  scale  factors  in  brightness. 
Thus  recognition  is  not  affected  by  new  lighting  condi¬ 
tions  that  mainly  result  in  such  brightness  changes. 

5.2  Simulated  Annealing  vs.  Exhaustive 
Search 

We  also  implemented  an  exhaustive  search  of  the  en¬ 
tire  parameter  space  to  compare  its  running  time  to  our 
fast  simulated  annealing  algorithm.  The  comparison  of 
our  simulated  annealing  algorithm  and  exhaustive  search 


Figure  7:  A  typical  run  of  our  simulated  annealing  algo¬ 
rithm.  The  sign  is  found  after  about  300  iterations  (ca. 
18  s). 


6  Avoiding  False  Matches 

The  error  in  the  sampled  coefficient  of  correlation  r  in¬ 
creases  if  the  number  of  pixels  pt  in  the  image  window 
considered  decreases.  For  large  samples  of  pr  pixels  the 
error  of  r  can  be  expressed  as  the  mean  squared  error 
(MSE) 

(see  Figure  8  and  Weatherburn  [Wea62]).  As  Weather- 
burn  points  out,  the  sampling  distribution  of  r  is  never 


Figure  5:  Scene  images  used  in  recognition  experiments.  The  images  are  named  by  the  sign  in  the  scene  and  a  number 
if  the  same  sign  is  in  more  than  one  scene  image.  Reading  left  to  right,  the  images  are:  Footpath,  E-no-entry.  No-entry 
1  k  2.  One-way,  Priority  1,  2,  k  3,  Slow  1,  2,  3,  k  4,  Stop  1,  2,  3,  4,  k  5,  and  Yield  1  &  2. 
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Image  4  Image  5  Image  6 


Figure  6:  False  positive  matches:  Images  I  and  2  show  templates  constructed  from  models  Slow  and  Yield  overlying 
the  sign  in  image  Slow2  (correlation  values  0.56  and  0.58,  respectively.)  Images  3  and  4  are  cropped  images  of  Stop4 
and  Stop5  illustrating  the  best  match  with  templates  made  from  the  Stop  model.  For  images  Stop4  and  Stop5,  we 
obtain  better  correlation  values  using  models  E-no-entry  and  Yield.  Cropped  versions  of  image  Stop5  illustrating 
these  false  positive  matches  are  shown  in  Images  5  and  6. 


TABLE  1 

Correlation  Values  for  Recognition  Task 


Models 


Images 

Footpath 

E-no-entry 

No-entry 

One-way 

Priority 

Slow 

Stop 

Yield 

Footpath 

0.77 

0.59 

0.38 

0.37 

0.46 

0.29 

0.35 

0.62 

E-no-entry 

0.49 

0.73 

0.39 

0.43 

0.46 

0.26 

0.38 

0.62 

No-entryl 

0.22 

0.21 

0.67 

0.31 

0.24 

0.18 

0.17 

0.40 

No-entry2 

0.29 

0.18 

0.84 

0.37 

0.14 

0.26 

0.23 

0.35 

One-way 

0.37 

0.55 

0.24 

0.70 

0.40 

0.38 

0.31 

0.58 

Priority  1 

0.36 

0.49 

0.34 

0.35 

0.58 

0.32 

0.30 

0.44 

Priority2 

0.46 

0.54 

0.40 

0.45 

0.66 

0.29 

0.32 

0.31 

Priority3 

0.37 

0.57 

0.40 

0.39 

0.62 

0.34 

0.37 

0.56 

Slowl 

0.25 

0.29 

0.25 

0.25 

0.45 

0.74 

0.15 

0.38 

Slow2 

0.38 

0.48 

0.39 

0.39 

0.32 

0.56  2nd 

0.21 

0.58 

Slow3 

0.39 

0.58 

0.41 

0.38 

0.40 

0.62 

0.30 

0.59 

Stopl 

0.41 

0.47 

0.42 

0.30 

0.22 

0.25 

0.69 

0.58 

Stop2 

0.23 

0.16 

0.27 

0.25 

0.18 

0.11 

0.38 

0.30 

Stop3 

0.26 

0.20 

0.33 

0.19 

0.13 

0.00 

0.34 

0.19 

Stop4 

0.42 

0.73 

0.46 

0.50 

0.43 

0.32 

0.56  3rd 

0.66 

Stop5 

0.43 

0.73 

0.44 

0.48 

0.29 

0.31 

0.51  3rd 

0.65 

Yield  I 

0.45 

0.75 

0.39 

0.50 

0.53 

0.32 

0.37 

0.78 

Yield2 

0.42 

0.73 

0.39 

0.50 

0.43 

0.32 

0.36 

0.82 
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even  approximately  normal.  The  probability  curve  is  models  we  use  have  a  large  enough  number  of  coherence 
very  skewed  in  the  neighborhood  of  p  =  ±1,  even  for  cells  for  robust  detection,  but  subsequent  downsampling 
large  samples,  in  generation  of  templates  may  corrupt  this. 


Correlation  Coefficient  rho 


7  Results  on  Noisy  Images 

Gaussian  noise  is  added  to  the  brightness  values  of  some 
of  the  scene  images  to  examine  the  robustness  of  our 
algorithm.  The  algorithm  is  able  to  find  the  sign  even 
in  strongly  degraded  pictures.  The  signal-to-noise  ratio 
(SNR)  of  a  noisy  image  is  defined  as  10  log  of  the  variance 
of  the  noisy  image  over  the  variance  of  the  noise. 

Several  noisy  images  are  obtained  by  corrupting  image 
Slow3  by  zero-mean  Gaussian  noise  with  various  signal- 
to-noise  ratios.  Our  results  for  image  Slow3  are  summa¬ 
rized  in  Figure  10.  Note  that  the  correlation  increases 
as  the  signal-to-noise  ratio  increases. 


Figure  8:  Mean  squared  error  of  r  for  pr  =  100,  400  and 
2500. 


The  normalized  auto- correlation  of  model  image  M (x,  y) 
is 


R{rX}  Ty) 


Er  T,yM(x,y)M(x-Tt,y-Tvl 

E*  E| ,(M(x>y))2 


The  faster  the  auto-correlation  falls  off,  the  higher  the 
resolution  of  the  model  image.  Examples  of  auto¬ 
correlation  images  are  shown  in  Figure  9.  The  resolu¬ 
tion  of  a  given  model  image  can  be  measured  with  a  sin¬ 
gle  number,  the  coherence  area  A  =  ^2y (/2(ar,  t/))2. 

Given  the  coherence  area  .4  and  the  number  of  pixels  n 
of  the  number  of  coherence  cells  is  c  =  n/A. 

The  number  of  coherence  ceils  is  equivalent  to  the  num¬ 
ber  of  degrees  of  freedom  of  the  model  image.  It  can 
be  used  as  a  measure  of  the  information  content  of  the 
model  image. 

We  examine  the  information  content  of  each  model 
image  to  evaluate  how  useful  the  model  image  is  for  the 
recognition  task.  All  our  model  images  M(x,  y)  have  the 
same  number  of  pixels  n.  Model  images  with  low  reso¬ 
lution  (little  structure)  such  as  the  European  No-entry 
and  Yield  signs,  do  not  have  enough  information  con¬ 
tent  for  robust  object  recognition.  This,  and  the  mean 
squared  error  in  r  for  small  pj,  are  responsible  for  the 
false  positive  matches  reported  in  Table  1.  In  order  to 
avoid  false  matches,  we  need  to  avoid  using  such  model 
images  with  low  information  content. 

The  models  that  contribute  to  the  false  positive 
matches,  E-no-entry  and  Yield,  have  a  coherence  area 
of  313  and  197,  respectively.  This  is  much  higher  than 
the  coherence  area  for  models  with  more  reliable  match¬ 
ing  results.  For  example,  the  Footpath  and  Stop  signs’ 
auto-correlation  fails  off  much  faster;  their  coherence  ar¬ 
eas  are  148  and  56,  respectively.  The  number  of  coher¬ 
ence  cells  in  E-no-entry  is  297  and  in  Yield  473,  but  in 
Footpath  it  is  628  and  in  Stop,  even  1641. 

Thus,  the  number  of  coherence  cells  is  a  quantitative 
measure  for  determining  if  a  model  has  enough  infor¬ 
mation  content  to  be  useful  a s  a  template.  Most  of  the 


SNR  of  noisy  Slow 3  in  dB 

Figure  10:  Correlation  coefficient  for  sign  recognition  in 
noisy  versions  of  image  Slow3. 

Figure  11  shows  images  Slow3  and  Slow4  corrupted 
by  Gaussian  noise  with  zero  mean  and  SNR  3  dB  and 
5  dB,  respectively.  Matches  for  pictures  with  much  lower 
SNR  are  possible  for  templates  with  much  larger  number 
of  pixels  and  information  content  than  those  presented. 
(In  radar  and  sonar,  signals  with  negative  SNR  are  com¬ 
monly  extracted  given  sufficient  information  content.) 

8  Conclusions 

Our  method  has  been  shown  to  efficiently  recognize  ob¬ 
jects  in  complicated  landscapes  in  the  presence  of  noise. 
To  our  knowledge,  our  work  is  the  first  to  apply  fast  sim¬ 
ulated  annealing  to  object  recognition.  Our  results  show 
that  it  makes  the  parameter  search  of  object  recognition 
feasible. 

We  strongly  advocate  the  use  of  template  matching 
in  recognition  tasks  and  provide  quantitative  techniques 
to  analyze  its  limits.  We  show  how  to  measure  the  in¬ 
formation  content  of  templates  as  a  way  to  make  the 
recognition  algorithm  robust. 

For  the  application  of  traffic  signs,  we  have  shown 
that  the  search  space  can  be  successfully  reduced  by  us¬ 
ing  a  three  parameter  transformation  from  model  image 
to  template.  This  method  is  well  suited  for  recognition 


Figure  11:  The  first  and  third  images  are  images  Slow3  and  Slow4  degraded  by  Gaussian  noise  with  zero  mean  and 
SNR  3  dB  and  5  dB,  respectively.  The  second  and  fourth  images  illustrate  that  the  object  is  recognized  where  the 
templates  computed  are  shown  overlying  the  recognized  sign  in  the  scene.  (These  images  are  shown  brighter  so  that 
the  overlying  template  can  be  illustrated  better.) 


tasks  that  involve  objects  with  scale  and  shape  varia¬ 
tions.  The  method  is  so  efficient  that  templates  can  be 
constructed  on-line  during  the  search. 

For  future  work,  severe  illumination  variations  within 
the  object  and  occlusion  problems  can  be  addressed. 
Other  applications  of  our  method,  for  example  in  medical 
computer  vision  and  in  face  recognition,  are  being  inves¬ 
tigated.  A  recent  paper  by  Brunelli  and  Poggio  [BP93] 
reports  successful  face  recognition  using  template  match¬ 
ing.  The  authors  normalize  their  test  images  by  fixing 
the  direction  of  the  eye-to-eye  axis  and  the  interocular 
distance.  The  location  of  the  masks  for  eye,  nose,  mouth, 
and  face  templates  are  also  fixed.  We  believe  that  we  can 
generalize  Brunelli  and  Poggio’s  application  to  recognize 
faces  in  images  that  are  not  normalized  but  contain  more 
general  scenes  with  varied  backgrounds. 
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